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Abstract 

Highly-parallcl graphics processing units (GPUs) can improve the speed of mi¬ 
cromagnetic simulations significantly as compared to conventional computing 
using central processing units (CPUs). We present a strategy for performing 
GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU ac¬ 
cess offered by cloud computing services with an open-source Python-based 
program for running the MuMax3 micromagnetics code remotely. We analyze 
the scaling and cost benefits of using cloud computing for micromagnetics. 
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1. Introduction 

Micromagnetic simulations provide quantitative predictions for complex mag¬ 
netic physicsfl], including the influences of demagnetization, spin-transfer torque[2], 
and the Dzyaloshinskii-Moriya interaction [3, 4]. Recent advances in graphics 
processing units (GPUs) have prompted the integration of such computing ca¬ 
pacity into micromagnetic packages [5-10]. The massively-parallel character of 
GPUs is particularly well-suited to accelerating large finite-difference calcula¬ 
tions, such as the simulation of magnetization dynamics in extended films and 
the full layer structures of magnetic tunnel junctions. However, GPU-based 
computing requires specialized hardware. Furthermore, current GPU-based sim¬ 
ulators are based on the CUDA software library, which is restricted to NVIDIA- 
manufactured hardware, further limiting their accessibility. Here we discuss an 
approach we have developed for running micromagnetic simulations on cloud 
computing services, thereby eliminating the need to purchase and maintain ded¬ 
icated GPU hardware. We present open-source software that allows researchers 
who are unfamiliar with GPUs and cloud computing to readily perform cost- 
efficient micromagnetic simulations from any computer. Finally, we analyze 
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the conditions under which the GPU-based approach confers an advantage over 
CPU-based simulations. 

While a number of GPU-accelerated micromagnetic packages have been de¬ 
veloped, we focus on the open-source project MuMax3[6, 7]. Open-source codes 
are free of the licensing restrictions of commerical packages that often preclude 
their execution in cluster environments or on cloud computing platforms. More¬ 
over, the availability of the source code to the scientific community allows the 
underlying mechanics of the simulations to be scrutinized if doubts are raised 
about the results. 

2. Cloud computing services 

Cloud computing services are a comprehensive set of tools for perform¬ 
ing computations on hardware resources that are offered over the Internet [11], 
Providers sell access to virtual computers, known as instances, that run on their 
hardware and can be launched on an on-demand or reservation basis. Instances 
come in a variety of hardware configurations that are reflected in their hourly 
prices. When using cloud computing services to perform a GPU-based micro- 
magnetic simulation, a user first launches a GPU instance on the provider’s 
servers. Rather than having to install the necessary software after every launch, 
instances can instead be based on previously created “images” that have the 
micromagnetic and supporting packages pre-installed. Simulation input files 
are transferred to the running instance, and the simulation runs on the remote 
hardware until completion. The data is then transferred back to the user’s local 
computer. At this point the instance can be stopped to avoid incurring further 
hourly charges, or kept open to continue with other simulations. 

Here we investigate the use of the GPU instance type offered by Amazon 
Web Services (AWS)[12]. At the time of this writing, this instance type has 
an NVIDIA GRID K520 GPU card, with 1536 CUDA cores and 4 GB of video 
memory for an hourly price of $0.65. A consumer NVIDIA card with comparable 
specifications, the NVIDIA GTX 770, retails for $310 ± 20. At the lower bound, 
where only the GPU cost is considered, a researcher would need to perform over 
480±30 hours of simulations on AWS before the cost of the graphics card is 
recovered. This calculation considers neither the cost of a desktop computer 
able to house the GPU nor the ongoing maintenance costs thereof. AWS has 
upgraded their GPU instance hardware and reduced their pricing in the past, 
so the cloud-based solution may compare even more favorably in the future. 

Another potential advantage of running GPU-based simulations in cloud 
computing environments is the opportunity for parallelism with no up-front 
costs. A single MuMax3 simulation fully occupies a GPU during execution, so 
that the number of simulations that can be run simultaneously is limited to the 
number of available GPUs. Purchasing and maintaining an array of GPUs is 
unlikely to prove cost effective in typical use cases, especially since simulation 
workloads are often sporadic. On the other hand, each researcher using AWS 
can launch 5 parallel instances on-demand or up to 20 instances on a reservation 
basis[12]. This easily-scaled computing capacity, which has largely motivated 
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Figure 1: Illustration of MuCloud use. (a) On the local computer, a normal MuMax3 input file 
is passed into MuCloud, which (b) connects to a new or existing AWS GPU instance, (c) The 
input file is transferred to the remote instance and MuMax3 is started, (d) The simulation 
runs on the remote GPU, (e) during which time the web browser interface of MuMax3 is 
accessible on the local computer, (f) Upon completion, output files generated by MuMax3 
are transferred back to the local computer by MuCloud. 


the interest in cloud-computing platforms for web services, can offer significant 
speed-ups when running large batches of simulations without incurring anything 
beyond the standard hourly instance charges. 

3. MuCloud Software 

We have developed an open-source Python script, MuCloud, that runs Mu- 
Max3 simulations on AWS GPU instances irrespective of the user’s local oper¬ 
ating system. This code can be obtained on GitHub[13] along with full docu¬ 
mentation regarding its installation, capabilities, and use. The operation of the 
script is detailed in Figure 1. For security purposes, Secure Shell (SSH) and 
Secure File Transfer Protocol (SFTP) ensure that all data is encrypted while 
passing between the local and remote computers. The MuMax3 web interface 
is accessible so that the local user can control and monitor the simulations on 
the remote instance in real time. 


4. Performance 

We quantify the performance of MuMax3 on AWS GPU instances by tim¬ 
ing the execution of a magnetic-field-driven reversal simulation for a variety 
of simulated sample sizes. We compare to the performance of a CPU-based 
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Figure 2: (a) Geometry of the field-reversal problem for comparing solvers, (b) Micromagnetic 
texture during field-reversal, (c) Run time required to simulate 1 ns of the problem using 
either (blue) MuMax3 or (red) OOMMF, as a function of the number of simulation cells 
(27V x N X 1). (solid circles) Zero-temperature simulations are performed with Dormand- 
Prince (RK45) evolution, while (open squares) finite temperature (77 K) simulations use the 
Heun method, (black dashed line) Simulations approach 0(N log N) complexity at high cell 
count. 


solver, OOMMF[14] (running on 4 threads), and determine the system size 
regime where the GPU-based cloud-computing approach has superior perfor¬ 
mance. A simple magnetic-field-driven reversal problem is chosen for the per¬ 
formance benchmark because micromagnetic packages tend to diverge in their 
implementation of more complex phenomena ( e.g. the inclusion of spin-transfer 
torque). No attempts are made to optimize the execution in either simulator, 
and therefore we expect that this comparison is accurate for typical-use cases. 
Our analysis differs from those presented by Arne et as?. [6] and Lopez-Diaz et 
a?. [10] in that we consider the total simulation execution time, instead of the 
solver step time. 

Our benchmark problem examines an elliptical thin-film Permalloy (NiFe) 
nano-magnet as illustrated in Figure 2(a). We use an exchange interaction 
strength of A ex = 13 x 10~ 12 J/m, saturation magnetization of M s = 800 x 10 3 
A/m, and an initial magnetization (m 0 ) saturated in the +x direction. The 
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Figure 3: The ratio of the CPU and GPU run times associated with simulating the problem 
with either (red) zero-temperature with Dormand-Prince (RK45) or (blue) finite-temperature 
(77 K) with the Heun method. Above approximately 5,000 cells, the large parallelism of 
MuMax3’s GPU based solvers provides a performance advantage compared to the CPU-based 
simulation. 


magnet has an aspect ratio of 2, with sizes ranging from 16 x 8 x 2 nm 3 to 
4000 x 2000 x 2 nm 3 . The simulation region is discretized into 2 N x N x 1 
cubes of 8 nm 3 volume. N is restricted to multiples of 8 to avoid performance 
penalties in the fast-Fourier transform (FFT) algorithm that arise from non- 
“seven smooth” system dimensions. 

We test the zero- and finite-temperature performance using the Dormand- 
Prince (RK45) and Heun methods, respectively, for evolving the Landau-Liftshitz 
equation. We apply an external magnetic field of H ext = —60i + 20 y mT with 
a step-like time dependence, which causes the sample magnetization to reverse 
by domain nucleation and propagation following smaller amplitude precessional 
pre-switching oscillations. Figure 2(b) illustrates the micromagnetic texture of 
the magnetization during the reversal process. 

Figure 2(c) illustrates the execution time required to simulate 1 ns of the 
field-reversal problem. The additional burden of computing thermal fields at 
finite temperatures raises the execution time by almost two orders of magnitude, 
but does not significantly change the qualitative dependence on the system size. 
Figure 3 shows the relative ratio between the CPU and GPU run times. Below 
5,000 cells, the CPU simulations have better performance. In our problem, 
this corresponds to a magnet with dimensions at or below 128 x 64 x 2 nm 3 
(N = 64). In this regime the reduced speed of the GPU clock compared to 
the CPU, the delays for memory transfers to and from GPU memory, and other 
execution latencies limit the performance of the graphics card since its instrinsic 
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parallelism is not taken advantage of. In larger simulations, however, the GPU- 
based simulations provide a speed-up factor from 7 to 10. Both the CPU and 
GPU calculations exhibit similar O(NlogN) complexity for large sizes, which is 
expected from the FFT operations involved in calculating the demagnetization 
field. This causes the speed-up factor associated with using the GPU to saturate 
for large systems. 

Although current micromagnetic simulation packages are hard-coded to uti¬ 
lize either CPU or GPU resources, it should be possible to utilize a heterogeneous 
computing approach to enable seamless and intelligent transitions between pro¬ 
cessor types based on the geometry and characteristics of the simulated system. 
While CPU bound simulation codes remain the better choice for conducting 
small simulations, the nearly order-of-magnitude reduction in simulation time 
for large systems constitutes a significant advantage to using GPU-accelerated 
micromagnetics on cloud computing services. 

5. Conclusion 

Cloud computing services provide a means for researchers to obtain the 
performance enhancements of GPU-based micromagnetic simulations without 
investing in specialized computer hardware. This opens new possibilities such 
as simultaneous simulations across a large number of remote instances. We 
present an open-source program (MuCloud) that allows MuMax3 simulations 
to be run on AWS instances, so that researchers can easily access this new 
avenue for micromagnetics. With these tools, we demonstrate that a nearly ten¬ 
fold performance enhancement can be obtained over CPU-based micromagnetic 
codes when simulating large systems. 

We acknowledge Barry Robinson and Jim Entwood for introducing us to 
Amazon Web Services. This research was supported by the NSF (DMR-1010768) 
and IARPA (W911NF-14-C-0089). 
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